Horvitz-Thompson estimators for functional data: asymptotic 
confidence bands and optimal allocation for stratified sampling 



> 



Herve Cardot, Etienne Josserand 
email : herve.cardot@u-bourgogne.fr, etienne.josserand@u-bourgogne.fr 
Institut de Mathematiques de Bourgogne, UMR CNRS 5584, Universite de Bourgogne 
9 Avenue Alain Savary - B.P. 47870, 21078 DIJON Cedex - France 

September 30, 2010 



Abstract 

When dealing with very large datasets of functional data, survey sampling ap- 
proaches are useful in order to obtain estimators of simple functional quantities, without 
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^vq being obliged to store all the data. We propose here a Horvitz-Thompson estimator of 

the mean trajectory. In the context of a superpopulation framework, we prove under 
mild regularity conditions that we obtain uniformly consistent estimators of the mean 
function and of its variance function. With additional assumptions on the sampling 
| ; design we state a functional Central Limit Theorem and deduce asymptotic confidence 

bands. Stratified sampling is studied in detail, and we also obtain a functional version 
i of the usual optimal allocation rule considering a mean variance criterion. These tech- 

niques are illustrated by means of a test population of TV = 18902 electricity meters 
for which we have individual electricity consumption measures every 30 minutes over 
one week. We show that stratification can substantially improve both the accuracy of 
the estimators and reduce the width of the global confidence bands compared to simple 



OO 

(T) random sampling without replacement. 

<N 

keywords. Asymptotic variance; Functional Central Limit Theorem; Superpopulation 
model; Supremum of Gaussian processes; Survey sampling. 

> 

•l-H 

1 Introduction 

& 

The development of distributed sensors has enabled access to potentially huge databases 
of signals evolving along time and observed on very fine scales. Exhaustive collection of 
such data would require major investments, both for transmission of the signals through 
networks and for storage. As noted in Chiky & Hebrail (2008), survey sampling of the 
sensors, which entails randomly selecting only a part of the curves of the population and 
which represents a trade off between limited storage capacities and the accuracy of the data, 
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may be relevant compared to signal compression in order to obtain accurate approximations 
to simple functional quantities such as mean trajectories. 

Our study is motivated by the estimation, in a fixed time interval, of the mean elec- 
tricity consumption curve of a large number of consumers. The French electricity operator 
EDF, Electricite De France, intends over the next few years to install over 30 million elec- 
tricity meters, in each firm and household, which will be able to send individual electricity 
consumption measures on very fine time scales. Collecting, saving and analyzing all this 
information, which may be considered as functional, would be very expensive. As an il- 
lustrative example, a sample of 20 individual curves, selected among a test population of 
N = 18902 electricity meters, is plotted in Figure [T] The curves consist, for each company 
selected, of the electricity consumption measured every 30 minutes over a period of one 
week. The target is the mean population curve, and we note the high variability between 
individuals. 

Using survey sampling strategies is one way to get accurate estimates at reasonable 
cost. The main questions addressed in this paper are to determine the precision of a survey 
sampling strategy and the strategies likely to improve the sampling selection process in 
order to obtain estimators that are as accurate as possible and to derive global confidence 
bands that are as sharp as possible for stratified sampling. There is a vast literature in 
survey sampling theory ; see for example Fuller (2009). However, as far as we know, 
the convergence issue with such sampling strategies in finite population has not yet been 
studied in the functional data analysis literature (Ramsay & Silverman, 2005, Miiller, 2005) 
except by Cardot et al. (2010), where the objective was to reduce the dimension of the data 
through functional principal components in the Hilbert space of square integrable functions. 
Here we adopt a different point of view and consider the sampled trajectories as elements 
of the space of continuous functions equipped with the usual sup norm in order to get 
uniform consistency results through maximal inequalities. Then, it is possible to build 
global confidence bands with the help of properties of suprema of Gaussian processes and 
the functional central limit theorem. 

2 Notation, estimators and basic properties 

Let us consider a finite population Un = {1, N} of size N, and suppose that to 
each unit k in Un we can associate a unique function Y^(t), for t G [0, T], with T < oo. Our 
target is the mean trajectory 

= ^E y fcW' *e[o,T]. (l) 

We consider a sample s drawn from Un according to a fixed-size sampling design pn(s), 
where pn(s) is the probability of drawing the sample s. The size tin of s is nonrandom and 
we suppose that the first and second order inclusion probabilities satisfy tt^ = ¥(k 6 s) > 0, 
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Figure 1: A sample of 20 individual electricity consumption curves. The mean profile is 
plotted in bold line. 



for all k G Un, and ir k i = ^(k Sz I G s) > for all k, I G Un, k ^ I, so that each unit and 
each pair of units can be drawn with a non null probability from the population. 

It is now possible to write the classical Horvitz-Thompson estimator of the mean curve, 

= lE— ^ *e[0,n (2) 
N kTu nk 

where I k is the sample membership indicator, I k = 1 if A; € s and I k = otherwise. We 
clearly have E(I k ) = ir k and E(I k Ii) = -K kl . 

It is easy to check (Fuller, 2009) that this estimator is unbiased, i.e. for all t G [0, T], 
E{/Ijv(i)} = /ijv(t). Its covariance function 7jv(s,i) = cov {/Xjv(s), /ijv(t)} satisfies, for all 
(s,t) G [0,T] x [0,T], 

k&U N ldU N 

with Afc; = ir k i — 7r k iri if k I and A kk = ir k (l — ir k ). An unbiased estimator of 7at(s, t), for 
all (s,t) G [0,T] x [0,T], is 



y fc (s)y,(t)A fcl 
N 7^7^ n k n v k i 

fees (6s 
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With real data, such as the electricity consumption trajectories presented in Fig. [TJ 
we do not observe Y^it) at every instant t in [0, T] but only have an evaluation of at 
d discretization points = t\ < ■ ■ ■ < td = T. Assuming that there are no measurement 
errors, which seems realistic in the case of electricity consumption curves, and that the 
trajectories are regular enough, linear interpolation is a robust and simple way to obtain 
accurate approximations of the trajectories at every instant t. For each unit k in the sample 
s, the interpolated trajectory is defined by 

Y k (t) = y fc (t,) + yfc(t ; +l) ~f fc(tt) (t-t,), te[u,t i+l ]. (3) 

U+l — H 

It is then possible to define the Horvitz-Thompson estimator of the mean curve based on 
the discretized observations as 

The covariance function of fid, denoted by 7d(s,i) = cov {^(s), also satisfies for all 
(a,t) G [0,T] x [0,T], 

7<*M) = ^2 L L ^t^t Ah < ( 5 ) 

and, as above, an unbiased estimator of 7d(s,i) is 

ld{s,t) = —2 > > • 6 

To go further we must adopt an asymptotic point of view assuming that the size N of 
the population grows to infinity. 



3 Asymptotic Properties 
3.1 Assumptions 

Let us consider the superpopulation asymptotic framework introduced by Isaki & Fuller 
(1982) and discussed in detail in Fuller (2009). We consider a sequence of growing and 
nested populations Un with size ./V tending to infinity and a sequence of samples sn of 
size tin drawn from Un according to the fixed-size sampling designs pn(sn)- Let us denote 
by TTfcTv and 7TkiN their first and second order inclusion probabilities. The sequence of sub- 
populations is an increasing nested one while the sample sequence is not. For simplicity of 
notation, we drop the subscript iV in the following when there is no ambiguity. To prove 
our asymptotic results, we make the following assumptions. 

Ti 

Assumption 1. We assume that lim — = ir GlO, If. 
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Assumption 2. We assume that min7r/% > A > 0, min ir^ > A* > 0, limsup nmax\iTki — 

k k^l jV->oo k ¥=l 

7T fc 7T/| < Cl < OO. 

Assumption 3. For all k G U, £ C[0,T], the space of continuous functions on [0,T], and 
lim [in = (j, in C[0,T]. 

N^oo 

Assumption 4. There are two positive constants C2 and C3 and /3 > 1/2 such that, 
for all AT, iV-^ec/O^O)) 2 < C 2 and A^ 1 £ feeC/ (r fe (f) " Y k (s)) 2 < C 3 \t - s\ 2 ? for all 
(a,t) € [0,T] x [0,T]. 

Assumptions 1 and 2 concern the moment properties of the sampling designs and are 
fulfilled for sampling plans such as simple random sampling without replacement or stratified 
sampling (Robinson & Sarndal, 1983, Breidt & Opsomer, 2000). Assumptions 3 and 4 are 
of a functional nature and seem to be rather weak. Assumption 3 imposes only that the 
limit of the mean function exists and is continuous, and Assumption 4 states that the 
trajectories have a uniformly bounded second moment and their mean squared increments 
satisfy a Holder condition. 

3.2 Consistency 

We can now state the first consistency results, assuming that the grid of the dm discretization 
points becomes finer and finer in [0, T] as the population size N tends to infinity. 

Proposition 3.1. Let Assumptions 1-4 hold. If the discretization scheme satisfies 
liniAr^oo max|j =1 j ^ JV _ 1 } — ti\ 2/3 = o(n~ l ), then for some constant C, 

y/nE< sup \fid(t) - HN(t)\ > < C. 
\te[o,T] 



Proposition 3.1 states that if the grid is fine enough then classical parametric rates of 
convergence can be attained uniformly, the additional hypothesis meaning that for smoother 
trajectories, i.e. larger j3, fewer discretization points are needed. We would also like to ob- 
tain that 7d(£, t) is a consistent estimator of the variance function j]sr(t,t). To do so, we 
need to introduce additional assumptions concerning the higher-order inclusion probabili- 
ties and the fourth order moments of the trajectories. 

Assumption 5. We assume that 
Hindoo max( ilii2 ^ 3|i4 ) 6£ i 4jJ , \E{(I h I i2 - 7r ili2 )(J i3 Jj 4 - vr^)}] = 0, 
where D t ,N denotes the set of all distinct i-tuples (h, ■ ■ ■ ,it) from Un- 

We also suppose that there are two positive constants C4 and C5, such that A r_1 Ylkeu N ^fc(O) 4 < 
C 4 , and A^ 1 J2keU N ( Y k(t) ~ ^(*)} 4 <C 5 \t- s| 4/3 , for all (s,t) G [0,T] x [0,T] 
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The first part of Assumption 5 is more restrictive than Assumption 2 and is assumed, 
for example, in Breidt & Opsomer (2000, part of assumption (A7)). It holds, for instance, 
in simple random sampling without replacement and stratified sampling. 

Proposition 3.2. Let Assumptions 1-5 hold. If the discretization scheme satisfies 
liniiv^oo max( i=lri , A _ 1} \t i+1 - U\ = o(l), then 

nE< sup \%(t,t) - 7;v(M)| ? -> 0, iV^oo. 
\te[o,T] I 



The multiplier n that appears in the Proposition 3.2 is due to the fact njN(t,t) is 



a bounded function. Proposition 3.2 only states that we can obtain a uniformly consis- 
tent estimator of the variance function of the estimated mean trajectory. More restrictive 
conditions concerning the sampling design would be needed to get rates of convergence. 



3.3 Asymptotic normality and confidence bands 

Proceeding further, we would now like to derive the asymptotic distribution of our estimator 
fid in order to build asymptotic confidence intervals and bands. Obtaining the asymptotic 
normality of estimators in survey sampling is a technical and difficult issue even for sim- 
ple quantities such as means or totals of real numbers. Although confidence intervals are 
commonly used in the survey sampling community, the Central Limit Theorem has only 
been checked rigourously, as far as we know, for a few sampling designs. Erdos & Renyi 
(1959) and Hajek (1960) proved that the Horvitz-Thompson estimator is asymptotically 
Gaussian for simple random sampling without replacement. These results were extended 
more recently to stratified sampling by Bickel & Freedman (1994) and some particular cases 
of two-phase sampling designs by Chen & Rao (2007). Fuller (2009, §1.3) proposes a re- 
cent review. Let us assume that the Horvitz-Thompson estimator satisfies a Central Limit 
Theorem for real valued quantities with new moment conditions. 

Assumption 6. There is some 5 > 0, such that N^ 1 ^2 keUN \Yk(t)\ 2+S < oo for all 
t € [0,T], and {7jv {mat(*) — A*Jv(*)} — > ^(0,1) in distribution when N tends to 
infinity. 

We can now formulate the following proposition, which tells us that if the sampling 
design is such that the Horvitz-Thompson estimator of the total of real quantities is asymp- 
totically Gaussian, then our estimator fid is also asymptotically Gaussian in the space of 
continuous functions equipped with the sup norm. This means that point-wise normality 
can be transposed, under regularity assumptions on the trajectories and the asymptotic 
distance between adjacent discretization points, to a functional Central Limit Theorem. 

Proposition 3.3. Let Assumptions 1-4 and 6 hold and suppose that the discretization points 
satisfy 
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limjv_ 5 . 00 m&xu = i t d N -i} ~ ti\ 213 — o(n _1 ). We then have that 

\fn (pd — Mat) X in distribution in C[0, T] 

where X is a Gaussian random function taking values in C[0, T] with mean and covariance 
function j(s, t) = limjv->oo W7at(s, t). 

The proof, given in the Appendix, is based on the Cramer-Wold device which gives 
access to multivariate normality when considering discretized trajectories. Tightness ar- 
guments are then invoked in order to obtain the functional version of the Central Limit 
Theorem. 

Using heuristic arguments similar to those of Degras (2009), we can also build asymptotic 
confidence bands in order to evaluate the global accuracy of our estimator. To do so, we 
make use of an asymptotic result from Landau & Shepp (1970), which states that the 
supremum of a centred Gaussian random function Z taking values in C[0, T], with covariance 
function p(s, t) satisfies 



logpJ sup Z(t)>\\ = -{2sup t6[0>T] p(t,i)) \ (7) 
[te[o,T] J L J 

Assuming that inf t*f(t,t) > 0, it is easy to prove, with Slutsky's Lemma and Propositions 



lim \~ 2 

A— >oo 



3.2 



and 



3.3 



that the sequence of random functions Z n (t) = {7<i(i, t)} 1 \Pd{£) — ^N{t)} 
satisfies the Central Limit Theorem in C[0,T] and converges in distribution to Z(t). Then, 
the continuous mapping theorem tells us that, for each A > 0, P{sup t |2f n (£)| > A} converges 
to P{sup t \Z(t)\ > A}. Applying ([7]) to Z n , a direct computation yields that, for a given risk 
a > 0, 

~ m(t)\ < {2log(2 / a) j d (t, t)} 1 / 2 ,t g [0,T]] ~ I -a. (8) 

Equation Q indicates that, compared to point- wise confidence intervals, global ones 
can be obtained simply by replacing the scaling given by the quantile of a normal cen- 
tred unit variance Gaussian variable by the factor {21og(2/a)} 1//2 . For example, if a=0-05, 

1/2 

respectively a=0-01, then {21og(2/a)} ' = 2-716, respectively 3-255, instead of 1-960, re- 
spectively 2-576, for a point- wise confidence interval with 0-95 confidence, respectively 0-99. 
The result presented in equation is asymptotic and is therefore more reliable when a is 
close to zero as seen in our simulation study. 



4 Stratified sampling designs 

We now consider now the particular case of stratified sampling with simple random sampling 
without replacement in all strata, assuming the population U is divided into a fixed num- 
ber H of strata. This means that there is a partitioning of U into H sub-populations 
denoted by Uh, (h = 1,...,H). We can define the mean curve fj,h within each stra- 
tum h as Hh(t) = A'^" 1 ^ fcg{/h Yfc(t), t G [0,T], where Nh is the number of units in 
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stratum h. The covariance function, 7ft(s, t), within stratum h is defined by 7/i(s,t) = 
N^Y, k eu h {Yk(s)-»h(s)}{Y k (t) (M) e [0,21 * M- 

In stratified sampling with simple random sampling without replacement in all strata, 
the first and second order inclusion probabilities are explicitly known, and the mean curve 
estimator of (i N (t) is /2strat(*) = N' 1 J2h=i n h ±N h 52kes h Y k(i), t G [0,T], where s h is a 
sample of size nh, with nh < Nh, obtained by simple random sampling without replacement 
in stratum Uh- The covariance function of ju s trat, can be expressed as 

7strat(s,t) = j^J2 Nh H nH lh(s,t\ (s, t) G [0, T] X [0, T], 

h=l nfl 

with (N h - l)7h(s,t) = N h 7h(8,t). 

For real valued quantities, optimal allocation rules, which determine the sizes of 
the samples in all the strata, are generally defined in order to obtain an estimator whose 
variance is as small as possible. In our functional context, and as in the multivariate case 
(Cochran, 1977, §5A.2), determining an optimal allocation clearly depends on the criterion 
to be minimized. Indeed, one could consider many different optimization criteria which 
would lead to different optimal allocations rules. The width of the global confidence bands 
derived in equation Q depend only on the standard deviation of the estimator at each 
instant t and minimising the width at the worst instant of time or minimizing the average 
width along time are natural criteria. Nevertheless, finding the solution of such optimization 
problems is not trivial and not investigated further in this paper. If we consider the optimal 
allocation based on minimising the mean variance instead of the mean standard deviation, 
we can then find explicit and simple solutions to 

H 

min / 7strat(£j t) dt subject to y j = n and nh > 0, h = 1, . . . , H. (9) 

(m,...,n H ) , 



/ 7strat(£, t) dt subject to 2, n h = n an d Uh > 0, h = 1, . . . , H. 
Jo h=l 



The solution is 

* NhSh , v 

n h = n > (1°) 

with S\ = Jq jh{ti~t)dt, h = 1, . . . ,H, similar to that of the multivariate case when con- 
sidering a total variance criterion (Cochran, 1977). This means that a stratum with higher 
variance than the others should be sampled at a higher sampling rate nh/N^- The gain when 
considering optimal allocation compared to proportional allocation, i.e. = nNh/N, can 
also be derived easily. 

5 An illustration with electricity consumption 

Over the next few years Electricite De France plans to install millions of sophisticated elec- 
tricity meters that will be able to send, on request, electricity consumption measurements 
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every second. Empirical studies have shown that even the simplest survey sampling strate- 
gies, such as simple random sampling without replacement, are very competitive with signal 
processing approaches such as wavelet expansions, when the aim is to estimate the mean 
consumption curve. To test and compare the different possible strategies, a test population 
of N = 18902 electricity meters has been installed in small and large companies. These 
electricity meters have read electricity consumption every half an hour over a period of two 
weeks. 

We split the temporal observations and considered only the second week for estimation. 
The reading from first week were used to build the strata. Thus, our population of curves 
is a set of N = 18902 vectors Yfe = (Yfe(ti), . . . , lfc(i<f)} with sizes d = 336. Identifying 
each unit k of the population with its trajectory Yfc, we consider now a particular case of 
stratified sampling which consists in clustering the space C[0,T] of all possible trajectories 
into a fixed number of H strata. 



(a) (b) 
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Figure 2: (a) Mean curve in each stratum, (b) Theoretical standard deviation function 
^/■~f(t,t) for simple random sampling without replacement (solid line), stratified sampling 
with proportional allocation (dashed line) and stratified sampling with optimal allocation 
(dotted dashed line) sampling designs. 

The strata were built by clustering the population according to the maximum level of 
consumption during the first week. We decided to retain H = 4 different clusters based on 
the quartiles so that all the strata have the same size. The mean trajectories during the first 
week in the clusters, drawn in Figure [2] (a), show a clear size effect. The strata have been 
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numbered according to global mean consumption. Stratum 4, at the top of Figure [2] (a), 
corresponds to consumers with high global levels of consumption whereas stratum 1, at the 
bottom of Figure [2] (a), corresponds to consumers with low global levels of consumption. 

We compared three sampling strategies, with the same sample size n = 2000, to estimate 
the mean population curve fj,(t) and build confidence bands during the second week. In order 
to evaluate these estimators, we drew 1000 samples using the following sampling designs, 

SRSWR simple random sampling estimator without replacement, which was first tested by 
Electricite de France; 

Proportional stratified sampling with proportional allocation, in which allocation in each stratum 
is defined as follows = nNh/N; the size of each stratum is 500; 



Optimal stratified sampling with optimal allocation according to the rule defined in (10). The 
sizes of the strata are 126 (stratum 1), 212 (stratum 2), 333 (stratum 3) and 1329 
(stratum 4). 

To evaluate the accuracy of the estimators, we considered the following loss criteria, 
evaluated with discretized data using quadrature rules, for the estimator jEt, respectively 7, 
of the mean trajectory, respectively of the mean variance, 

R(jl) = [ T \m - /i(t)| dt, R(j) = / T | 7 (M)-7(M)I dt. (11) 



J Jo 

Basic statistics for the estimation errors of the mean function are given in Table[T] First, 
we observe that clustering the space of functions by means of stratified sampling leads to 
a large gain in terms of the accuracy of the estimators. In addition, there is a substantial 
difference between the proportional and the optimal allocation rules. 



Table 1: Estimation errors for /i and 7 (i,i) for the different sampling designs. 



Mean function Variance function 

Mean 1st quartile median 3rd quartile mean 1st quartile median 3rd quartile 

SRSWR 4-46 2-37 3-75 5-68 5-26 2-42 4-04 

Proportional 3-48 2-03 2-87 4-43 4-77 2-07 3-51 

Optimal 2-43 1-55 2-10 3-04 1-02 0-56 0-88 



We now examine the true standard deviation functions -*/'f(t,t), which are proportional 
to the width of the confidence bands. They depend on the sampling design and are drawn 
in Figure [2] (b). The theoretical standard deviation is much smaller, at all instants t, for the 
optimal allocation rule, and it is about twice smaller compared to simple random sampling 
without replacement. There is also a strong periodicity effect in the simple random sampling 
without replacement due to the lack of control over the units with high levels of consumption 
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(stratum 4). Estimation errors, according to criterion (11), of the true covariance functions 
are reported in Table [TJ The error is much smaller for stratified optimal allocation than 
for the other estimators; optimal allocation provides better estimates as well as better 
estimation of their variance. 

Finally, we computed the global confidence bands to check that formula (18]), which 
relies on asymptotic properties of the supremum of Gaussian processes, remains valid when 
considering confidence levels 0-95 and 0-99. The empirical coverage is close to the nominal 
one for the simple random sampling without replacement, 93-8% and 98-3%, whereas it is a 
little bit liberal, especially for smaller levels, for the stratified sampling designs, 88-7% and 
96-8% for proportional allocation, and 88-1% and 96-8% for optimal allocation. 



6 Concluding remarks 

The experimental results on a test population of electricity consumption curves confirm that 
stratification, in conjunction with the optimal allocation rule, can lead, in cases of such high 
dimensional data, to important gains in terms of the accuracy of the estimation and width 
of the global confidence bands compared to more basic approaches. We have proposed a 
simple rule to get confidence bands that could certainly be improved, in terms of empirical 
coverage, by computing more realistic scaling factors with bootstrap procedures (Faraway, 
1997) or Gaussian process simulations (Degras, 2010). 

Choosing appropriate strata is also an important aspect of such improvement. Nev- 
ertheless, it will generally be impossible to determine for all units to which cluster they 
belong. Borrowing ideas from Breidt & Opsomer (2008), one possible strategy is to per- 
form clustering on the observed sample and then try to predict to which stratum the units 
that are not in the sample belong using auxiliary information and supervised classification. 

We have assumed that the observed trajectories are not corrupted by noise at the dis- 
cretization points. Although this assumption seems quite reasonable in the case of electricity 
consumption measurements, it is not true in general. Thus, linear interpolation may not 
always be effective and linear smoother estimators, such a kernels or smoothing splines, 
would probably be more appropriate ways to obtain functional versions of the discretized 
observations. 

Finally, another direction for future research is to combine optimal allocation for strat- 
ification with model-assisted estimation when auxiliary information is available. There are 
close relationships between the shape of electricity consumption curves and variables such 
as past consumption, temperature, household area or type of electricity contract. Such an 
estimation procedure relies, as noted in Cardot et al. (2010), on a parsimonious represen- 
tation of the trajectories in order to reduce the dimension of the data. One way to achieve 
this is to first perform a functional principal components analysis and then to model the 
relationship between the principal components and the auxiliary information. 
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Appendix : proofs 



of Proposition \3J\ We study approximation and sampling errors separately : 

sup \fid(t) - Miv(*)| < sup Ifidit) - A*Jv(*)| + sup Ijujv(t) - Miv(*)|- 
te[o,T] te[o,T] te[o,T] 



(12) 



Suppose t€ [U, t i+1 [ then \Y k (t) -Y k (t)\ < \Y k {ti) - Y k (t i+l )\ + \Y k (t) - Y k (U)\. By Assump- 
tions 1-2 and an application of the Cauchy-Schwarz inequality, 

i?? m T, aw < 1 v |yfc(t) -^ }l 

\Hd{t) - fJ,N(t)\ < j^z^~ 



k£s 



< 



mm keUN 7T fc 



1/2 



1 



< -^Ce\ti+i — t 



for some positive constant Cq which does not depend on t. Consequently, 

y/n sup \fl d (t) -fl N (t)\ < \Jn— max \t i+ i - uf . 
te[o,T] A «e{i,...,djv-i} 

We now study the sampling error. Consider the pseudo- metric 



(13) 



d 2 N (s,t) = nE{j2 N (t) - fi N (t) - jtijv(s) + /ijv(s)} z 
for all (s,t) £ [0,T] x [0, T]. We have, for some constant C7, 



d 2 N (s,t) < 



n 

N2 



k,i£U N 



< It — s 

~ NX 1 



n k 7ri 

1 2/3 



\Y k (t)-Y k (s)\\Y e (t)-Y e (s)\ 

E {Yk(t)-Y k (s)Y 



+ -^max|A fc£ 
A 2 k^e 



< C 7 \t-s\ 2 ?. (14) 

We apply a result of van der Vaart and Wellner (2000, §2.2) based on maximal inequal- 
ities to get the uniform convergence and consider the packing number D(e, d/v), which is 
the maximum number of points in [0, T] whose distance between each pair is strictly larger 
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than e. It is clear from ( 14) that D(e, oIn) = 0(e Considering now the particular Orlicz 
norm with ip(x) = x 2 in Theorem 2.2.4 of van der Vaart and Welner (2000), we directly 
final that ff V -1 (e~ 1/,3 )de < oo when ft > 1/2, and consequently there is a constant Cs 
such that 



E \ y/nsup\fi, N (t) - fi N (t) -jl N (s) + fi N (s)\} < C 8 . 

s,t 



(15) 



Since sup t |Juat(*) - < |i"Ar(0) - Mjv(0)| + sup s t |/ijv(i) - Mat(*) - Miv(s) + a*at(s)|, 

we get the announced result with (12), (Il3|) and (15). □ 



of Proposition 3.2. The proof follows the same lines as the proof of proposition 3.1 Let us 
first write, 

sup \%(t,t) - 7at(M)| < sup \%{t,t) (t,t)\ + sup |tat(M) - 7Ar(M)[16) 
te[o,T] te[o,T] *e[o,T] 

Suppose t G and define 5 k i(t) = \Yi(t) — Yj(t)| \Y k (t)\. With Assumptions 1-3, we 

have, for some constants Cg and Cio, 



\%(t,t)-j N (t,t)\ < 



N 2 



£ \Y k \t) - Y k 2 (t) + max | A kl \ £ £ {5 H (t) + «5, fc (t)} 



< Cl0 \+ i \@ 

- jy 
Thus, using Assumption 1, 

n sup |7d(M) - 7aKM)| ^ Cio max |t i+ i-t i | /3 . 

te[0,T] t6{l,...,djv-l} 

Consider now the sampling error and define, for (s,t) £ [0,T] x [0,T], c^(s,t) = 
n 2 S {%(*,*) - 7 jv(M) -7Jv(s,«) +7^(s,s)} 2 and = Y fc (t)Y z (t) - Y fe (s)y(s). We 

have 



(17) 



d 2 N (s,t) 



n 

iV4 



&ki(s,t)(i>k>i>(s,t) m ki e 

*■ — * ' — * 7rt.7T/ TTu/TTii 

k,ieU N k',i'eu N 



4^ 



1 



Following the same lines as the proof of Theorem 3 in Breidt & Opsomer (2000), we get 
after some algebra that, for some constant Cn, 



d 2 N (s,t) < C u 



n 1 + max \E {(I k Ii - vr H ) (Iyl v - n k 'l')}\ 
(k,l,k',l')eD 4 , N 



t-s\ 2ls . (18) 



Applying again a maximal inequality as in the Proof of Proposition 3.1, we get the 
announced result. □ 



of Proposition 3.3 Noting that, with (13), ^/n {/Id(i) — ^v(i)} = \/n {Miv(^) ~~ A*jv (*)} + 
o(l), uniformly in i, we only need to study the asymptotic distribution of the random 
function X n (t) = \/ n {R/vW — /ijv(t)} , for t G [0, T]. 
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We first consider a m-tuple (t\, . . . , t m ) G [0, T] m , a vector c T = (ci, . . . , c m ) G i? m and 
prove that Yli=\ CiX n {ti) is asymptotically Gaussian for all c G i? m . Considering Yfc c = 
Yli=i c iYk(U), it is clear, with Assumption 6, that iV -1 Y^keU \Yk c \ 2+S < oo and we have 

i=l I k&s k i=l J 

Denoting by ju c = A^ _1 X]fces ^ ^fcc the Horvitz-Thompson estimator of // c = A^ -1 ^ fcg {/ Yk c , 
it is clear that fj, c = Y^ILi c i^N{ti)-, E(p c ) = fj, c , and with Assumption 6, \/n(jj, c — E(p c )) con- 
verges in distribution to N(0, c T Mc) where M is a covariance matrix with generic elements 
[M]ij = j(ti,tj). The Cramer-Wold device tells us that the vector (X n (ti), . . . ,X n (t m )) is 
asymptotically multivariate normal. 

Secondly, we need to check that X n satisfies a tightness property in order to get the 
asymptotic convergence in distribution in the space of continuous functions C[0, T]. We 
have with gi), for all (s,t) G [0,T] x [0,T], E{\X n (t) - X n (s)\ 2 } < C 7 \t - s\ 2/3 , and the 
sequence X n is tight, when /3 > 1/2, according to Theorem 12.3 of Billingsley (1968). □ 
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